AITopics | oov word

Collaborating Authors

oov word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

Kim, Nayeon, Jeon, Eojin, Park, Jun-Hyung, Lee, SangKeun

arXiv.org Artificial IntelligenceJul-8-2025

In this study, we introduce KOPL, a novel framework for handling K orean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and letters. KOPL incorporates phoneme and word representations for Korean OOV words, facilitating Korean OOV word representations to capture both text and phoneme information of words. We empirically demonstrate that KOPL significantly improves the performance on Korean Natural Language Processing (NLP) tasks, while being readily integrated into existing static and contextual Korean embedding models in a plug-and-play manner. Notably, we show that KOPL outperforms the state-of-the-art model by an average of 1.9%. Our code is available at https://github.com/jej127/KOPL.git.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-96-8180-8_38

2507.04018

Country:

Europe (0.46)
Asia > South Korea (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluation Of P300 Speller Performance Using Large Language Models Along With Cross-Subject Training

Parthasarathy, Nithin, Soetedjo, James, Panchavati, Saarang, Parthasarathy, Nitya, Arnold, Corey, Pouratian, Nader, Speier, William

arXiv.org Artificial IntelligenceOct-19-2024

Amyotrophic lateral sclerosis (ALS), a progressive neuromuscular degenerative disease, severely restricts patient communication capacity within a few years of onset, resulting in a significant deterioration of quality of life. The P300 speller brain computer interface (BCI) offers an alternative communication medium by leveraging a subject's EEG response to characters traditionally highlighted on a character grid on a graphical user interface (GUI). A recurring theme in P300-based research is enhancing performance to enable faster subject interaction. This study builds on that theme by addressing key limitations, particularly in the training of multi-subject classifiers, and by integrating advanced language models to optimize stimuli presentation and word prediction, thereby improving communication efficiency. Furthermore, various advanced large language models such as Generative Pre-Trained Transformer (GPT2), BERT, and BART, alongside Dijkstra's algorithm, are utilized to optimize stimuli and provide word completion choices based on the spelling history. In addition, a multi-layered smoothing approach is applied to allow for out-of-vocabulary (OOV) words. By conducting extensive simulations based on randomly sampled EEG data from subjects, we show substantial speed improvements in typing passages that include rare and out-of-vocabulary (OOV) words, with the extent of improvement varying depending on the language model utilized. The gains through such character-level interface optimizations are approximately 10%, and GPT2 for multi-word prediction provides gains of around 40%. In particular, some large language models achieve performance levels within 10% of the theoretical performance limits established in this study. In addition, both within and across subjects, training techniques are explored, and speed improvements are shown to hold in both cases.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.15161

Country:

North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology > Amyotrophic Lateral Sclerosis (ALS) (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies

Anand, Srija, Varadhan, Praveen Srinivasa, Sankar, Ashwin, Raju, Giri, Khapra, Mitesh M.

arXiv.org Artificial IntelligenceJul-18-2024

Publicly available TTS datasets for low-resource languages like Hindi and Tamil typically contain 10-20 hours of data, leading to poor vocabulary coverage. This limitation becomes evident in downstream applications where domain-specific vocabulary coupled with frequent code-mixing with English, results in many OOV words. To highlight this problem, we create a benchmark containing OOV words from several real-world applications. Indeed, state-of-the-art Hindi and Tamil TTS systems perform poorly on this OOV benchmark, as indicated by intelligibility tests. To improve the model's OOV performance, we propose a low-effort and economically viable strategy to obtain more training data. Specifically, we propose using volunteers as opposed to high quality voice artists to record words containing character bigrams unseen in the training data. We show that using such inexpensive data, the model's performance improves on OOV words, while not affecting voice quality and in-domain performance.

application, category, oov word, (13 more...)

arXiv.org Artificial Intelligence

2407.13435

Country:

Asia > India (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

High Performance P300 Spellers Using GPT2 Word Prediction With Cross-Subject Training

Parthasarathy, Nithin, Soetedjo, James, Panchavati, Saarang, Parthasarathy, Nitya, Arnold, Corey, Pouratian, Nader, Speier, William

arXiv.org Artificial IntelligenceMay-22-2024

Amyotrophic lateral sclerosis (ALS) severely impairs patients' ability to communicate, often leading to a decline in their quality of life within a few years of diagnosis. The P300 speller brain-computer interface (BCI) offers an alternative communication method by interpreting a subject's EEG response to characters presented on a grid interface. This paper addresses the common speed limitations encountered in training efficient P300-based multi-subject classifiers by introducing innovative "across-subject" classifiers. We leverage a combination of the second-generation Generative Pre-Trained Transformer (GPT2) and Dijkstra's algorithm to optimize stimuli and suggest word completion choices based on typing history. Additionally, we employ a multi-layered smoothing technique to accommodate out-of-vocabulary (OOV) words. Through extensive simulations involving random sampling of EEG data from subjects, we demonstrate significant speed enhancements in typing passages containing rare and OOV words. These optimizations result in approximately 10% improvement in character-level typing speed and up to 40% improvement in multi-word prediction. We demonstrate that augmenting standard row/column highlighting techniques with layered word prediction yields close-to-optimal performance. Furthermore, we explore both "within-subject" and "across-subject" training techniques, showing that speed improvements are consistent across both approaches.

bit minute, flashboard, language model, (16 more...)

arXiv.org Artificial Intelligence

2405.13329

Country:

North America > United States > New York (0.04)
North America > United States > Texas (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Amyotrophic Lateral Sclerosis (ALS) (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners

Huang, Jocelyn, Bakhturina, Evelina, Tatanov, Oktai

arXiv.org Artificial IntelligenceFeb-28-2023

Grapheme-to-phoneme (G2P) transduction is part of the standard text-to-speech (TTS) pipeline. However, G2P conversion is difficult for languages that contain heteronyms -- words that have one spelling but can be pronounced in multiple ways. G2P datasets with annotated heteronyms are limited in size and expensive to create, as human labeling remains the primary method for heteronym disambiguation. We propose a RAD-TTS Aligner-based pipeline to automatically disambiguate heteronyms in datasets that contain both audio with text transcripts. The best pronunciation can be chosen by generating all possible candidates for each heteronym and scoring them with an Aligner model. The resulting labels can be used to create training datasets for use in both multi-stage and end-to-end G2P systems.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.14523

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.46)
(2 more...)

Add feedback

Revisit Out-Of-Vocabulary Problem for Slot Filling: A Unified Contrastive Frameword with Multi-level Data Augmentations

Guo, Daichi, Dong, Guanting, Fu, Dayuan, Wu, Yuxiang, Zeng, Chen, Hui, Tingfeng, Wang, Liwen, Li, Xuefeng, Wang, Zechen, He, Keqing, Cui, Xinyue, Xu, Weiran

arXiv.org Artificial IntelligenceFeb-27-2023

In real dialogue scenarios, the existing slot filling model, which tends to memorize entity patterns, has a significantly reduced generalization facing Out-of-Vocabulary (OOV) problems. To address this issue, we propose an OOV robust slot filling model based on multi-level data augmentations to solve the OOV problem from both word and slot perspectives. We present a unified contrastive learning framework, which pull representations of the origin sample and augmentation samples together, to make the model resistant to OOV problems. We evaluate the performance of the model from some specific slots and carefully design test data with OOV word perturbation to further demonstrate the effectiveness of OOV words. Experiments on two datasets show that our approach outperforms the previous sota methods in terms of both OOV slots and words.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.13584

Country:

North America > Curaçao (0.05)
Asia > China > Beijing > Beijing (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

Qu, Leyuan, Weber, Cornelius, Wermter, Stefan

arXiv.org Artificial IntelligenceFeb-21-2023

Due to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-processing, we tackle this problem on an earlier processing level and eliminate the bias in acoustic modeling to recognize OOV words acoustically. We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words. Specifically, we enlarge the classification loss used for training neural networks' parameters of utterances containing OOV words (sentence-level), or rescale the gradient used for back-propagation for OOV words (word-level), when fine-tuning a previously trained model on synthetic audio. To overcome catastrophic forgetting, we also explore the combination of loss rescaling and model regularization, i.e. L2 regularization and elastic weight consolidation (EWC). Compared with previous methods that just fine-tune synthetic audio with EWC, the experimental results on the LibriSpeech benchmark reveal that our proposed loss rescaling approach can achieve significant improvement on the recall rate with only a slight decrease on word error rate. Moreover, word-level rescaling is more stable than utterance-level rescaling and leads to higher recall rates and precision on OOV word recognition. Furthermore, our proposed combined loss rescaling and weight consolidation methods can support continual learning of an ASR system.

machine learning, natural language, recognition, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neunet.2023.01.027

2302.09723

Country:

Europe > United Kingdom (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > Canada > Quebec > Montreal (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

Dingliwal, Saket, Sunkara, Monica, Bodapati, Sravan, Ronanki, Srikanth, Farris, Jeff, Kirchhoff, Katrin

arXiv.org Artificial IntelligenceNov-13-2022

End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previous time steps to influence future predictions. To tackle this, we propose a novel two-way approach that first biases the encoder with attention over a predefined list of rare long-tail and out-of-vocabulary (OOV) words and then uses dynamic boosting and phone alignment network during decoding to further bias the subword predictions. We evaluate our approach on open-source VoxPopuli and in-house medical datasets to showcase a 60% improvement in F1 score on domain-specific rare words over a strong CTC baseline.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.0951

Country: Europe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)

Add feedback

Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

Fox, Jennifer Drexler, Delworth, Natalie

arXiv.org Artificial IntelligenceSep-2-2022

Contextual ASR, which takes a list of bias terms as input along with audio, has drawn recent interest as ASR use becomes more widespread. We are releasing contextual biasing lists to accompany the Earnings21 dataset, creating a public benchmark for this task. We present baseline results on this benchmark using a pretrained end-to-end ASR model from the WeNet toolkit. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. Our baseline results confirm observations that end-to-end models struggle in particular with words that are rarely or never seen during training, and that existing shallow fusion techniques do not adequately address this problem. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative and of out-of-vocabulary words by 97.2% relative, compared to contextual biasing without alternate spellings. This model is conceptually similar to ones used in prior work, but is simpler to implement as it does not rely on either a pronunciation dictionary or an existing text-to-speech system.

bias term, contextual, training data, (15 more...)

arXiv.org Artificial Intelligence

2209.0125

Country:

Asia > China (0.04)
Oceania > Australia (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.31)

Add feedback

How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?

Araabi, Ali, Monz, Christof, Niculae, Vlad

arXiv.org Artificial IntelligenceAug-17-2022

Neural Machine Translation (NMT) is an open vocabulary problem. As a result, dealing with the words not occurring during training (a.k.a. out-of-vocabulary (OOV) words) have long been a fundamental challenge for NMT systems. The predominant method to tackle this problem is Byte Pair Encoding (BPE) which splits words, including OOV words, into sub-word segments. BPE has achieved impressive results for a wide range of translation tasks in terms of automatic evaluation metrics. While it is often assumed that by using BPE, NMT systems are capable of handling OOV words, the effectiveness of BPE in translating OOV words has not been explicitly measured. In this paper, we study to what extent BPE is successful in translating OOV words at the word-level. We analyze the translation quality of OOV words based on word type, number of segments, cross-attention weights, and the frequency of segment n-grams in the training data. Our experiments show that while careful BPE settings seem to be fairly useful in translating OOV words across datasets, a considerable percentage of OOV words are translated incorrectly. Furthermore, we highlight the slightly higher effectiveness of BPE in translating OOV words for special cases, such as named-entities and when the languages involved are linguistically close to each other.

oov word, translation, translation quality, (12 more...)

arXiv.org Artificial Intelligence

2208.05225

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Berlin (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(21 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback